from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler,MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from ELM import ELM
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense,Dropout
from tensorflow.keras.callbacks import EarlyStopping
np.random.seed(42)
print("Aktualnie ustawiony seed:", np.random.get_state()[1][0])
Aktualnie ustawiony seed: 42
data = pd.read_csv('diagnosed_cbc_data_v4.csv')
1. Preliminary analysis¶
data
| WBC | LYMp | NEUTp | LYMn | NEUTn | RBC | HGB | HCT | MCV | MCH | MCHC | PLT | PDW | PCT | Diagnosis | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10.00 | 43.200 | 50.100 | 4.30000 | 5.00000 | 2.77 | 7.3 | 24.2000 | 87.7 | 26.3 | 30.1 | 189.0 | 12.500000 | 0.17000 | Normocytic hypochromic anemia |
| 1 | 10.00 | 42.400 | 52.300 | 4.20000 | 5.30000 | 2.84 | 7.3 | 25.0000 | 88.2 | 25.7 | 20.2 | 180.0 | 12.500000 | 0.16000 | Normocytic hypochromic anemia |
| 2 | 7.20 | 30.700 | 60.700 | 2.20000 | 4.40000 | 3.97 | 9.0 | 30.5000 | 77.0 | 22.6 | 29.5 | 148.0 | 14.300000 | 0.14000 | Iron deficiency anemia |
| 3 | 6.00 | 30.200 | 63.500 | 1.80000 | 3.80000 | 4.22 | 3.8 | 32.8000 | 77.9 | 23.2 | 29.8 | 143.0 | 11.300000 | 0.12000 | Iron deficiency anemia |
| 4 | 4.20 | 39.100 | 53.700 | 1.60000 | 2.30000 | 3.93 | 0.4 | 316.0000 | 80.6 | 23.9 | 29.7 | 236.0 | 12.800000 | 0.22000 | Normocytic hypochromic anemia |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1276 | 4.40 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.86 | 13.5 | 46.1526 | 80.7 | 27.7 | 34.4 | 180.0 | 14.312512 | 0.26028 | Healthy |
| 1277 | 5.60 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.85 | 15.0 | 46.1526 | 91.7 | 31.0 | 33.8 | 215.0 | 14.312512 | 0.26028 | Healthy |
| 1278 | 9.20 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.47 | 13.1 | 46.1526 | 88.7 | 29.3 | 33.0 | 329.0 | 14.312512 | 0.26028 | Healthy |
| 1279 | 6.48 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.75 | 13.2 | 46.1526 | 86.7 | 27.9 | 32.1 | 174.0 | 14.312512 | 0.26028 | Healthy |
| 1280 | 8.80 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.95 | 15.2 | 46.1526 | 89.7 | 30.6 | 34.2 | 279.0 | 14.312512 | 0.26028 | Healthy |
1281 rows × 15 columns
The table consists of 1281 rows and 16 columns, describing various blood parameters and medical diagnoses. Here is a description of each column:
- WBC - White Blood Cell count.
- LYMp - Lymphocyte percentage.
- NEUTp - Neutrophil percentage.
- LYMn - Lymphocyte count.
- NEUTn - Neutrophil count.
- RBC - Red Blood Cell count.
- HGB - Hemoglobin concentration.
- HCT - Hematocrit.
- MCV - Mean Corpuscular Volume.
- MCH - Mean Corpuscular Hemoglobin.
- MCHC - Mean Corpuscular Hemoglobin Concentration.
- PLT - Platelet count.
- PDW - Platelet Distribution Width.
- PCT - Plateletcrit.
- Diagnosis - Medical diagnosis.
The Diagnosis column contains values:
Healthy - Indicates that the individual's blood test results are within normal ranges, showing no signs of anemia, infection, or other hematological disorders.
Normocytic hypochromic anemia - Anemia characterized by red blood cells (RBCs) that are of normal size (normocytic) but have less hemoglobin than normal (hypochromic). This can occur in conditions such as chronic disease anemia or early iron deficiency anemia.
Normocytic normochromic anemia - Anemia where the RBCs are normal in size and hemoglobin content but reduced in number. This type is often seen in chronic diseases, acute blood loss, or bone marrow disorders.
Iron deficiency anemia - Anemia caused by a lack of iron, which is necessary for the production of hemoglobin. This results in microcytic (small) and hypochromic (pale) RBCs. Common causes include blood loss, poor diet, or malabsorption.
Thrombocytopenia - A condition characterized by a low platelet count, which can lead to increased bleeding and bruising. Causes include bone marrow disorders, autoimmune diseases, and certain medications.
Other microcytic anemia - Anemia with RBCs that are smaller than normal (microcytic), not specifically classified under iron deficiency. It can be caused by conditions like thalassemia or chronic disease.
Leukemia - A type of cancer that affects blood and bone marrow, leading to an overproduction of abnormal white blood cells. Symptoms can include fatigue, frequent infections, and easy bruising.
Macrocytic anemia - Anemia where the RBCs are larger than normal (macrocytic). It is often caused by deficiencies in vitamin B12 or folate, and it can also be due to alcoholism, liver disease, or certain medications.
Leukemia with thrombocytopenia - A condition where an individual has both leukemia and a low platelet count. This combination can exacerbate the symptoms of both conditions, such as increased bleeding risk and fatigue.
data['Diagnosis'].value_counts()
Diagnosis Healthy 336 Normocytic hypochromic anemia 279 Normocytic normochromic anemia 269 Iron deficiency anemia 189 Thrombocytopenia 73 Other microcytic anemia 59 Leukemia 47 Macrocytic anemia 18 Leukemia with thrombocytopenia 11 Name: count, dtype: int64
The obvious problem of this data can be, that some of the types of anemia has a very little amount of occurence. Let's try to look on some visualisations, which can give us some informations.
for col in data.columns:
if col != 'Diagnosis':
plt.figure(figsize=(10, 6))
sns.histplot(data=data, x=col, hue='Diagnosis', bins=100)
plt.title(f'Histogram for column {col} relative to Diagnosis.')
plt.show()
As we can see, a considerable number of features, such as PCT, MCH, and HCT, are quite difficult to interpret and may be of little use. However, there are features that show greater differences across various diagnoses, such as PLT and WBC. For now, let's simplify our modeling by encoding all 'Healthy' diagnoses as Healthy (1) and all other records as Unhealthy (0), and then look at the same visualization again.
data.loc[data['Diagnosis'] == 'Healthy', 'Healthy'] = 1
data.loc[data['Diagnosis'] != 'Healthy', 'Healthy'] = 0
data.head()
| WBC | LYMp | NEUTp | LYMn | NEUTn | RBC | HGB | HCT | MCV | MCH | MCHC | PLT | PDW | PCT | Diagnosis | Healthy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10.0 | 43.2 | 50.1 | 4.3 | 5.0 | 2.77 | 7.3 | 24.2 | 87.7 | 26.3 | 30.1 | 189.0 | 12.5 | 0.17 | Normocytic hypochromic anemia | 0.0 |
| 1 | 10.0 | 42.4 | 52.3 | 4.2 | 5.3 | 2.84 | 7.3 | 25.0 | 88.2 | 25.7 | 20.2 | 180.0 | 12.5 | 0.16 | Normocytic hypochromic anemia | 0.0 |
| 2 | 7.2 | 30.7 | 60.7 | 2.2 | 4.4 | 3.97 | 9.0 | 30.5 | 77.0 | 22.6 | 29.5 | 148.0 | 14.3 | 0.14 | Iron deficiency anemia | 0.0 |
| 3 | 6.0 | 30.2 | 63.5 | 1.8 | 3.8 | 4.22 | 3.8 | 32.8 | 77.9 | 23.2 | 29.8 | 143.0 | 11.3 | 0.12 | Iron deficiency anemia | 0.0 |
| 4 | 4.2 | 39.1 | 53.7 | 1.6 | 2.3 | 3.93 | 0.4 | 316.0 | 80.6 | 23.9 | 29.7 | 236.0 | 12.8 | 0.22 | Normocytic hypochromic anemia | 0.0 |
data['Healthy'].value_counts()
Healthy 0.0 945 1.0 336 Name: count, dtype: int64
Given the values displayed in the previous cells, these data are still not ideal for us due to a significant imbalance. There are over three times more individuals with 'Healthy = 0' than 'Healthy = 1'. This imbalance may impact the performance of our model, but we are not concerned about it.
for col in data.drop(columns='Diagnosis').columns:
if col != 'Healthy':
plt.figure(figsize=(10, 6))
sns.histplot(data=data, x=col, hue='Healthy', bins=100)
plt.title(f'Histogram for column {col} relative to Diagnosis.')
plt.show()
What has happened here is that we have ensured that columns such as:
- PLT,
- WBC,
- HGB
play a leading role in our predictions. Let's see how our observations reflect the visualization of the correlation matrix using a heatmap.
plt.figure(figsize=(16,8))
sns.heatmap(data.drop(columns='Diagnosis').corr(), annot=True)
<AxesSubplot:>
The high correlation between Healthy and PLT or HGB only reinforces our suspicions. Let's now gradually move on to building our model (or models). Of course, let's start by dividing our data into independent variables and dependent variables. Remember that for now, we will focus on binary classification, so let's not forget to remove the Diagnosis column in the process.
X,y = data.drop(columns=['Diagnosis','Healthy']), data['Healthy']
Before we proceed to build the model correctly, let's check if we can somehow describe our dependent variables using two features. Here, PCA (Principal Component Analysis) will come to our aid, which will extract the most important information from our data. However, before we even apply PCA, we need to start with some data standardization, as PCA itself is very sensitive to it. Here, we'll use the StandardScaler function, which will transform our data in such a way that they have a mean equal to 0 and a standard deviation equal to 1. Then, we'll use the PCA function setting the parameter n_components = 2, allowing our unsupervised learning algorithm to extract two main feature components from the entire dataset. In both cases, we'll use the fit_transform() function right away because for now, we're operating on the entire dataset, not the divided one. Finally, we'll visualize our data.
X_scaled = StandardScaler().fit_transform(X)
X_scaled_pca = PCA(n_components=2).fit_transform(X_scaled)
data_plot = pd.DataFrame(data=X_scaled_pca, columns=['PC1', 'PC2'])
data_plot['Healthy'] = y
plt.figure(figsize=(8, 6))
sns.scatterplot(data=data_plot, x='PC1', y='PC2', hue='Healthy')
plt.title('PCA Chart')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend()
plt.show()
2. Binary classification¶
As you can easily notice, distinguishing our two classes using only two features is practically impossible. We could have expected this by looking at the previous histplots, however, it was good to make sure.
Now we move on to the proper model construction. We start by splitting our dataset into training data and test data.
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=42, stratify=y)
X_train.shape, X_test.shape
((960, 14), (321, 14))
At the very beginning, we will assess the performance of logistic regression, testing several examples with different values of the regularization parameter.
pipeline = make_pipeline(
StandardScaler(),
PCA(),
LogisticRegression()
)
param_grid =[
{
'pca__n_components': [4, 6, 8, 10, 12, 14],
'logisticregression__C': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
}
]
gridsearch1 = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=10).fit(X_train,y_train)
gridsearch1.best_estimator_
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=12)),
('logisticregression', LogisticRegression(C=100))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=12)),
('logisticregression', LogisticRegression(C=100))])StandardScaler()
PCA(n_components=12)
LogisticRegression(C=100)
gridsearch1.best_score_
0.8979166666666666
gridsearch1.score(X_test,y_test)
0.8691588785046729
y_scores = gridsearch1.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y_test,gridsearch1.predict(X_test)), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Text(95.72222222222221, 0.5, 'Actuval Value')
As we can see, logistic regression alone can achieve a fairly high score, reaching up to 87% on the training set. This is already quite satisfactory for us. However, let's try something more complex. We move on to Support Vector Machines (SVM), which in the scikit-learn package are defined as SVR for regression and SVC for classification. To expand our test panel, we won't rely on just one type of kernel. We'll use the RBF kernel, based on the radial basis function, as well as the linear kernel. The grid of parameters created is quite extensive, which may lead to longer waiting times for the final result.
pipeline1 = make_pipeline(
StandardScaler(),
PCA(),
SVC(probability=True)
)
param_grid =[
{
'pca__n_components': [10, 12, 14],
"svc__kernel": ['rbf'],
'svc__C': [0.01, 0.1, 1, 10, 50, 100],
'svc__gamma': [0.001, 0.01, 0.1, 1, 10, 50]
},
{
'pca__n_components': [10, 12, 14],
'svc__kernel': ['linear'],
'svc__C': [0.001, 0.01, 0.1, 1, 10, 50, 100]
}
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X_train,y_train)
gridsearch1.best_estimator_
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=50, gamma=0.1, probability=True))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=50, gamma=0.1, probability=True))])StandardScaler()
PCA(n_components=14)
SVC(C=50, gamma=0.1, probability=True)
gridsearch1.best_score_
0.9541666666666666
gridsearch1.score(X_test,y_test)
0.956386292834891
plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y_test,gridsearch1.predict(X_test)), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Text(95.72222222222221, 0.5, 'Actuval Value')
y_scores = gridsearch1.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
As we can see, right from the start, we obtained a model that has over 95% accuracy on the test set, which may already be quite satisfactory for us. Our model selected the version with 14 components, which is the same number of initial features. However, there's no time to rest on our laurels; let's visualize how this accuracy looks against all parameters. It seems to me that the best idea would be to visualize specifically for n_components = 14, so we'll adopt that tactic.
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14].dropna() #dropna usuwa nam wiersze gdzie nie ma gammy, czyli te linearne svm
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
What conclusions can we draw from the above analysis? Certainly, we can see that we can shift significantly to the "right" in terms of the C parameter values. It is in this direction that the accuracy of models increases, so we can expect that with an increase in the C parameter, our model should be more accurate. Let's check our conclusions in practice.
param_grid =[
{
'pca__n_components': [12,13,14],
"svc__kernel": ['rbf'],
'svc__C': [10, 50, 100 , 500, 1000, 5000, 10000],
'svc__gamma': [0.00001,0.0001,0.001, 0.01, 0.1, 1, 10]
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X_train,y_train)
gridsearch1.best_estimator_
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=5000, gamma=0.01, probability=True))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=5000, gamma=0.01, probability=True))])StandardScaler()
PCA(n_components=14)
SVC(C=5000, gamma=0.01, probability=True)
gridsearch1.best_score_
0.9572916666666668
gridsearch1.score(X_test,y_test)
0.9470404984423676
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
y_scores = gridsearch1.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
What results did we get? Theoretically, the model improved, but it improved on the training set, which doesn't entirely satisfy us. We would prefer our model to generalize as well as possible rather than perform optimally on the training set. However, there's no need to lose hope; let's try manipulating the grid search parameters a bit more, now focusing only on n_components = 14.
param_grid =[
{
'pca__n_components': [14],
"svc__kernel": ['rbf'],
'svc__C': np.arange(1000,11000,1000),
'svc__gamma': np.linspace(0.001, 0.1,10)
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=5).fit(X_train,y_train)
gridsearch1.best_estimator_
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=1000, gamma=0.023000000000000003, probability=True))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=1000, gamma=0.023000000000000003, probability=True))])StandardScaler()
PCA(n_components=14)
SVC(C=1000, gamma=0.023000000000000003, probability=True)
gridsearch1.score(X_test,y_test)
0.9532710280373832
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
param_grid =[
{
'pca__n_components': [14],
"svc__kernel": ['rbf'],
'svc__C': np.linspace(500,1100,10),
'svc__gamma': np.linspace(0.01, 0.04,10)
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=5).fit(X_train,y_train)
gridsearch1.best_estimator_
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=766.6666666666667, gamma=0.026666666666666665,
probability=True))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=766.6666666666667, gamma=0.026666666666666665,
probability=True))])StandardScaler()
PCA(n_components=14)
SVC(C=766.6666666666667, gamma=0.026666666666666665, probability=True)
gridsearch1.score(X_test,y_test)
0.956386292834891
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
param_grid =[
{
'pca__n_components': [14],
"svc__kernel": ['rbf'],
'svc__C': np.linspace(6000,10000,10),
'svc__gamma': np.linspace(0.05, 0.2,10)
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=5).fit(X_train,y_train)
gridsearch1.best_estimator_
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=6000.0, gamma=0.11666666666666668, probability=True))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=6000.0, gamma=0.11666666666666668, probability=True))])StandardScaler()
PCA(n_components=14)
SVC(C=6000.0, gamma=0.11666666666666668, probability=True)
gridsearch1.score(X_test,y_test)
0.956386292834891
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
param_grid =[
{
'pca__n_components': [14],
"svc__kernel": ['rbf'],
'svc__C': np.linspace(3000,13000,15),
'svc__gamma': np.linspace(0.05, 0.2,10)
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=5).fit(X_train,y_train)
gridsearch1.best_estimator_
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=3000.0, gamma=0.15000000000000002, probability=True))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=3000.0, gamma=0.15000000000000002, probability=True))])StandardScaler()
PCA(n_components=14)
SVC(C=3000.0, gamma=0.15000000000000002, probability=True)
gridsearch1.score(X_test,y_test)
0.9532710280373832
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
plt.figure(figsize=(20,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
param_grid =[
{
'pca__n_components': [14],
"svc__kernel": ['rbf'],
'svc__C': np.linspace(3000,100000000,30),
'svc__gamma': np.linspace(0.05, 0.2,10)
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X_train,y_train)
gridsearch1.best_estimator_
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=3000.0, gamma=0.16666666666666669, probability=True))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc',
SVC(C=3000.0, gamma=0.16666666666666669, probability=True))])StandardScaler()
PCA(n_components=14)
SVC(C=3000.0, gamma=0.16666666666666669, probability=True)
gridsearch1.score(X_test,y_test)
0.956386292834891
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
plt.figure(figsize=(20,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y_test,gridsearch1.predict(X_test)), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Text(95.72222222222221, 0.5, 'Actuval Value')
y_scores = gridsearch1.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
After trying several different configurations, on various intervals, with different numbers of steps, where did it lead us? Unfortunately, we didn't find a better model than our initial one, but there's no need to be discouraged because the model is still satisfactory, if not very good. An accuracy level of nearly 96% is impressive. Now we can try using the TensorFlow package and create a neural network based on gradient-based methods.
model = Sequential()
model.add(layers.Flatten(input_shape=(14,1)))
model.add(layers.Dense(200,activation='relu'))
model.add(layers.Dense(2, activation='softmax'))
model.summary()
/home/pawel/.local/lib/python3.10/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(**kwargs) 2024-06-09 14:15:14.526827: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ flatten (Flatten) │ (None, 14) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense (Dense) │ (None, 200) │ 3,000 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 2) │ 402 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 3,402 (13.29 KB)
Trainable params: 3,402 (13.29 KB)
Non-trainable params: 0 (0.00 B)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
scalar = StandardScaler().fit(X_train)
X_train_scaled = scalar.transform(X_train)
X_test_scaled = scalar.transform(X_test)
history = model.fit(X_train_scaled,y_train, batch_size=1024,epochs=100, validation_data=(X_test_scaled,y_test))
Epoch 1/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 84ms/step - accuracy: 0.6177 - loss: 0.6454 - val_accuracy: 0.6573 - val_loss: 0.6425 Epoch 2/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step - accuracy: 0.6323 - loss: 0.6399 - val_accuracy: 0.6885 - val_loss: 0.6362 Epoch 3/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.6646 - loss: 0.6324 - val_accuracy: 0.6947 - val_loss: 0.6288 Epoch 4/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 58ms/step - accuracy: 0.6958 - loss: 0.6236 - val_accuracy: 0.6916 - val_loss: 0.6207 Epoch 5/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step - accuracy: 0.7188 - loss: 0.6141 - val_accuracy: 0.7103 - val_loss: 0.6121 Epoch 6/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.7260 - loss: 0.6041 - val_accuracy: 0.7227 - val_loss: 0.6033 Epoch 7/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.7292 - loss: 0.5939 - val_accuracy: 0.7321 - val_loss: 0.5945 Epoch 8/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step - accuracy: 0.7302 - loss: 0.5837 - val_accuracy: 0.7383 - val_loss: 0.5858 Epoch 9/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.7333 - loss: 0.5736 - val_accuracy: 0.7383 - val_loss: 0.5772 Epoch 10/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.7365 - loss: 0.5637 - val_accuracy: 0.7321 - val_loss: 0.5688 Epoch 11/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 56ms/step - accuracy: 0.7354 - loss: 0.5541 - val_accuracy: 0.7321 - val_loss: 0.5606 Epoch 12/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 60ms/step - accuracy: 0.7354 - loss: 0.5448 - val_accuracy: 0.7352 - val_loss: 0.5528 Epoch 13/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.7354 - loss: 0.5359 - val_accuracy: 0.7352 - val_loss: 0.5453 Epoch 14/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.7354 - loss: 0.5274 - val_accuracy: 0.7352 - val_loss: 0.5381 Epoch 15/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.5193 - val_accuracy: 0.7352 - val_loss: 0.5312 Epoch 16/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.5116 - val_accuracy: 0.7352 - val_loss: 0.5247 Epoch 17/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.7354 - loss: 0.5042 - val_accuracy: 0.7352 - val_loss: 0.5186 Epoch 18/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7344 - loss: 0.4973 - val_accuracy: 0.7352 - val_loss: 0.5127 Epoch 19/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.7344 - loss: 0.4906 - val_accuracy: 0.7352 - val_loss: 0.5071 Epoch 20/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.7344 - loss: 0.4843 - val_accuracy: 0.7352 - val_loss: 0.5018 Epoch 21/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step - accuracy: 0.7344 - loss: 0.4783 - val_accuracy: 0.7352 - val_loss: 0.4967 Epoch 22/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step - accuracy: 0.7344 - loss: 0.4726 - val_accuracy: 0.7352 - val_loss: 0.4919 Epoch 23/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.4671 - val_accuracy: 0.7352 - val_loss: 0.4874 Epoch 24/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.4619 - val_accuracy: 0.7352 - val_loss: 0.4830 Epoch 25/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.4570 - val_accuracy: 0.7352 - val_loss: 0.4788 Epoch 26/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7354 - loss: 0.4523 - val_accuracy: 0.7352 - val_loss: 0.4749 Epoch 27/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.7354 - loss: 0.4477 - val_accuracy: 0.7352 - val_loss: 0.4711 Epoch 28/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step - accuracy: 0.7354 - loss: 0.4434 - val_accuracy: 0.7352 - val_loss: 0.4674 Epoch 29/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.7354 - loss: 0.4393 - val_accuracy: 0.7352 - val_loss: 0.4640 Epoch 30/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.7354 - loss: 0.4353 - val_accuracy: 0.7352 - val_loss: 0.4607 Epoch 31/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7365 - loss: 0.4315 - val_accuracy: 0.7352 - val_loss: 0.4575 Epoch 32/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 56ms/step - accuracy: 0.7365 - loss: 0.4278 - val_accuracy: 0.7352 - val_loss: 0.4545 Epoch 33/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 56ms/step - accuracy: 0.7365 - loss: 0.4242 - val_accuracy: 0.7383 - val_loss: 0.4515 Epoch 34/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7365 - loss: 0.4208 - val_accuracy: 0.7445 - val_loss: 0.4487 Epoch 35/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.7385 - loss: 0.4175 - val_accuracy: 0.7508 - val_loss: 0.4459 Epoch 36/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.7396 - loss: 0.4143 - val_accuracy: 0.7508 - val_loss: 0.4432 Epoch 37/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 87ms/step - accuracy: 0.7406 - loss: 0.4111 - val_accuracy: 0.7508 - val_loss: 0.4406 Epoch 38/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 98ms/step - accuracy: 0.7427 - loss: 0.4081 - val_accuracy: 0.7539 - val_loss: 0.4380 Epoch 39/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.7552 - loss: 0.4051 - val_accuracy: 0.7570 - val_loss: 0.4356 Epoch 40/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.7563 - loss: 0.4022 - val_accuracy: 0.7664 - val_loss: 0.4332 Epoch 41/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.7604 - loss: 0.3994 - val_accuracy: 0.7695 - val_loss: 0.4310 Epoch 42/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.7677 - loss: 0.3967 - val_accuracy: 0.7819 - val_loss: 0.4288 Epoch 43/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.7760 - loss: 0.3940 - val_accuracy: 0.7882 - val_loss: 0.4267 Epoch 44/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.7865 - loss: 0.3914 - val_accuracy: 0.7944 - val_loss: 0.4247 Epoch 45/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 59ms/step - accuracy: 0.7979 - loss: 0.3889 - val_accuracy: 0.8006 - val_loss: 0.4227 Epoch 46/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.8094 - loss: 0.3864 - val_accuracy: 0.8069 - val_loss: 0.4208 Epoch 47/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.8167 - loss: 0.3839 - val_accuracy: 0.8100 - val_loss: 0.4189 Epoch 48/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.8260 - loss: 0.3815 - val_accuracy: 0.8224 - val_loss: 0.4171 Epoch 49/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.8323 - loss: 0.3791 - val_accuracy: 0.8255 - val_loss: 0.4154 Epoch 50/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.8448 - loss: 0.3767 - val_accuracy: 0.8318 - val_loss: 0.4137 Epoch 51/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 60ms/step - accuracy: 0.8490 - loss: 0.3744 - val_accuracy: 0.8442 - val_loss: 0.4122 Epoch 52/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.8573 - loss: 0.3721 - val_accuracy: 0.8505 - val_loss: 0.4107 Epoch 53/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.8635 - loss: 0.3698 - val_accuracy: 0.8536 - val_loss: 0.4092 Epoch 54/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.8708 - loss: 0.3675 - val_accuracy: 0.8598 - val_loss: 0.4079 Epoch 55/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 66ms/step - accuracy: 0.8781 - loss: 0.3653 - val_accuracy: 0.8692 - val_loss: 0.4065 Epoch 56/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.8844 - loss: 0.3630 - val_accuracy: 0.8754 - val_loss: 0.4052 Epoch 57/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8844 - loss: 0.3608 - val_accuracy: 0.8754 - val_loss: 0.4040 Epoch 58/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8875 - loss: 0.3587 - val_accuracy: 0.8816 - val_loss: 0.4028 Epoch 59/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8896 - loss: 0.3565 - val_accuracy: 0.8816 - val_loss: 0.4016 Epoch 60/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.8906 - loss: 0.3543 - val_accuracy: 0.8816 - val_loss: 0.4005 Epoch 61/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8938 - loss: 0.3522 - val_accuracy: 0.8847 - val_loss: 0.3995 Epoch 62/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 58ms/step - accuracy: 0.8969 - loss: 0.3501 - val_accuracy: 0.8879 - val_loss: 0.3984 Epoch 63/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.8969 - loss: 0.3480 - val_accuracy: 0.8879 - val_loss: 0.3974 Epoch 64/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.8990 - loss: 0.3459 - val_accuracy: 0.8879 - val_loss: 0.3963 Epoch 65/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9031 - loss: 0.3438 - val_accuracy: 0.8879 - val_loss: 0.3953 Epoch 66/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.9021 - loss: 0.3418 - val_accuracy: 0.8879 - val_loss: 0.3942 Epoch 67/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.9031 - loss: 0.3397 - val_accuracy: 0.8879 - val_loss: 0.3932 Epoch 68/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.9031 - loss: 0.3377 - val_accuracy: 0.8879 - val_loss: 0.3921 Epoch 69/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.9042 - loss: 0.3357 - val_accuracy: 0.8879 - val_loss: 0.3911 Epoch 70/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.9052 - loss: 0.3337 - val_accuracy: 0.8910 - val_loss: 0.3901 Epoch 71/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step - accuracy: 0.9052 - loss: 0.3317 - val_accuracy: 0.8910 - val_loss: 0.3891 Epoch 72/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.9062 - loss: 0.3298 - val_accuracy: 0.8910 - val_loss: 0.3882 Epoch 73/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9062 - loss: 0.3278 - val_accuracy: 0.8910 - val_loss: 0.3872 Epoch 74/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9052 - loss: 0.3259 - val_accuracy: 0.8910 - val_loss: 0.3862 Epoch 75/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 57ms/step - accuracy: 0.9052 - loss: 0.3240 - val_accuracy: 0.8910 - val_loss: 0.3853 Epoch 76/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9062 - loss: 0.3221 - val_accuracy: 0.8879 - val_loss: 0.3843 Epoch 77/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.9073 - loss: 0.3202 - val_accuracy: 0.8879 - val_loss: 0.3833 Epoch 78/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step - accuracy: 0.9083 - loss: 0.3184 - val_accuracy: 0.8879 - val_loss: 0.3823 Epoch 79/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step - accuracy: 0.9083 - loss: 0.3165 - val_accuracy: 0.8879 - val_loss: 0.3813 Epoch 80/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.9083 - loss: 0.3147 - val_accuracy: 0.8910 - val_loss: 0.3803 Epoch 81/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step - accuracy: 0.9073 - loss: 0.3129 - val_accuracy: 0.8879 - val_loss: 0.3793 Epoch 82/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 54ms/step - accuracy: 0.9052 - loss: 0.3110 - val_accuracy: 0.8879 - val_loss: 0.3783 Epoch 83/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.9062 - loss: 0.3092 - val_accuracy: 0.8879 - val_loss: 0.3772 Epoch 84/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - accuracy: 0.9062 - loss: 0.3074 - val_accuracy: 0.8879 - val_loss: 0.3760 Epoch 85/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 119ms/step - accuracy: 0.9083 - loss: 0.3057 - val_accuracy: 0.8847 - val_loss: 0.3749 Epoch 86/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.9083 - loss: 0.3039 - val_accuracy: 0.8847 - val_loss: 0.3738 Epoch 87/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.9083 - loss: 0.3022 - val_accuracy: 0.8847 - val_loss: 0.3727 Epoch 88/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.9073 - loss: 0.3004 - val_accuracy: 0.8847 - val_loss: 0.3715 Epoch 89/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.9073 - loss: 0.2987 - val_accuracy: 0.8847 - val_loss: 0.3703 Epoch 90/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9073 - loss: 0.2970 - val_accuracy: 0.8847 - val_loss: 0.3691 Epoch 91/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.9073 - loss: 0.2954 - val_accuracy: 0.8847 - val_loss: 0.3679 Epoch 92/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step - accuracy: 0.9073 - loss: 0.2937 - val_accuracy: 0.8816 - val_loss: 0.3668 Epoch 93/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.9062 - loss: 0.2921 - val_accuracy: 0.8816 - val_loss: 0.3656 Epoch 94/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.9052 - loss: 0.2905 - val_accuracy: 0.8816 - val_loss: 0.3644 Epoch 95/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.9052 - loss: 0.2889 - val_accuracy: 0.8816 - val_loss: 0.3633 Epoch 96/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step - accuracy: 0.9052 - loss: 0.2873 - val_accuracy: 0.8816 - val_loss: 0.3623 Epoch 97/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.9062 - loss: 0.2857 - val_accuracy: 0.8816 - val_loss: 0.3613 Epoch 98/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.9052 - loss: 0.2842 - val_accuracy: 0.8816 - val_loss: 0.3603 Epoch 99/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 58ms/step - accuracy: 0.9062 - loss: 0.2826 - val_accuracy: 0.8816 - val_loss: 0.3594 Epoch 100/100 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.9073 - loss: 0.2811 - val_accuracy: 0.8816 - val_loss: 0.3586
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
[<matplotlib.lines.Line2D at 0x7965d10c3340>]
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
[<matplotlib.lines.Line2D at 0x7965d1a92e30>]
model.evaluate(X_test_scaled,y_test)
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.8901 - loss: 0.3365
[0.35858526825904846, 0.881619930267334]
model = Sequential()
model.add(layers.Flatten(input_shape=(14,1)))
model.add(layers.Dense(14,activation='relu'))
model.add(Dropout(0.1))
model.add(layers.Dense(7,activation='relu'))
model.add(Dropout(0.1))
model.add(layers.Dense(3,activation='relu'))
model.add(Dropout(0.1))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics=['accuracy'])
/home/pawel/.local/lib/python3.10/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(**kwargs)
early_stop = EarlyStopping(monitor = 'val_loss',mode='min',verbose=1,patience=25)
model.fit(x = X_train_scaled,y = y_train,batch_size=2,validation_data=(X_test_scaled,y_test),epochs=600,callbacks=[early_stop])
Epoch 1/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.5999 - loss: 0.7037 - val_accuracy: 0.8629 - val_loss: 0.5071 Epoch 2/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8463 - loss: 0.4758 - val_accuracy: 0.8941 - val_loss: 0.3558 Epoch 3/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8610 - loss: 0.3600 - val_accuracy: 0.8941 - val_loss: 0.3922 Epoch 4/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8592 - loss: 0.3032 - val_accuracy: 0.9159 - val_loss: 0.4287 Epoch 5/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8646 - loss: 0.2865 - val_accuracy: 0.8972 - val_loss: 0.3573 Epoch 6/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8874 - loss: 0.2465 - val_accuracy: 0.9128 - val_loss: 0.3446 Epoch 7/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8511 - loss: 0.2765 - val_accuracy: 0.9159 - val_loss: 0.3460 Epoch 8/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8737 - loss: 0.2588 - val_accuracy: 0.9190 - val_loss: 0.3268 Epoch 9/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8805 - loss: 0.2401 - val_accuracy: 0.9252 - val_loss: 0.3315 Epoch 10/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9126 - loss: 0.2021 - val_accuracy: 0.9283 - val_loss: 0.3231 Epoch 11/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8939 - loss: 0.2384 - val_accuracy: 0.9283 - val_loss: 0.3096 Epoch 12/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9131 - loss: 0.1935 - val_accuracy: 0.9315 - val_loss: 0.3261 Epoch 13/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9008 - loss: 0.2062 - val_accuracy: 0.9346 - val_loss: 0.3041 Epoch 14/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8996 - loss: 0.2066 - val_accuracy: 0.9283 - val_loss: 0.3168 Epoch 15/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9188 - loss: 0.2099 - val_accuracy: 0.9283 - val_loss: 0.3238 Epoch 16/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9219 - loss: 0.1771 - val_accuracy: 0.9315 - val_loss: 0.2851 Epoch 17/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9144 - loss: 0.1857 - val_accuracy: 0.9252 - val_loss: 0.2705 Epoch 18/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9481 - loss: 0.1627 - val_accuracy: 0.9408 - val_loss: 0.2776 Epoch 19/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9267 - loss: 0.1887 - val_accuracy: 0.9377 - val_loss: 0.3052 Epoch 20/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9472 - loss: 0.1389 - val_accuracy: 0.9439 - val_loss: 0.2388 Epoch 21/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9256 - loss: 0.1715 - val_accuracy: 0.9408 - val_loss: 0.1976 Epoch 22/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9435 - loss: 0.1519 - val_accuracy: 0.9377 - val_loss: 0.2005 Epoch 23/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9292 - loss: 0.1515 - val_accuracy: 0.9408 - val_loss: 0.2068 Epoch 24/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9555 - loss: 0.1073 - val_accuracy: 0.9439 - val_loss: 0.1894 Epoch 25/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9668 - loss: 0.1221 - val_accuracy: 0.9533 - val_loss: 0.2097 Epoch 26/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9578 - loss: 0.1224 - val_accuracy: 0.9408 - val_loss: 0.1786 Epoch 27/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9711 - loss: 0.1062 - val_accuracy: 0.9564 - val_loss: 0.1781 Epoch 28/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9709 - loss: 0.1005 - val_accuracy: 0.9377 - val_loss: 0.1857 Epoch 29/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9624 - loss: 0.1017 - val_accuracy: 0.9595 - val_loss: 0.1962 Epoch 30/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9713 - loss: 0.0832 - val_accuracy: 0.9626 - val_loss: 0.1755 Epoch 31/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9634 - loss: 0.1000 - val_accuracy: 0.9533 - val_loss: 0.1675 Epoch 32/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9658 - loss: 0.1134 - val_accuracy: 0.9626 - val_loss: 0.1135 Epoch 33/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9713 - loss: 0.0807 - val_accuracy: 0.9626 - val_loss: 0.1236 Epoch 34/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9736 - loss: 0.0991 - val_accuracy: 0.9688 - val_loss: 0.0994 Epoch 35/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9650 - loss: 0.0890 - val_accuracy: 0.9688 - val_loss: 0.0958 Epoch 36/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9701 - loss: 0.0918 - val_accuracy: 0.9595 - val_loss: 0.1252 Epoch 37/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9694 - loss: 0.1541 - val_accuracy: 0.9595 - val_loss: 0.1054 Epoch 38/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9759 - loss: 0.0865 - val_accuracy: 0.9626 - val_loss: 0.1012 Epoch 39/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9732 - loss: 0.0886 - val_accuracy: 0.9626 - val_loss: 0.1013 Epoch 40/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9826 - loss: 0.0719 - val_accuracy: 0.9657 - val_loss: 0.0985 Epoch 41/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9771 - loss: 0.0716 - val_accuracy: 0.9657 - val_loss: 0.1027 Epoch 42/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9758 - loss: 0.0709 - val_accuracy: 0.9626 - val_loss: 0.0946 Epoch 43/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9841 - loss: 0.0551 - val_accuracy: 0.9626 - val_loss: 0.1016 Epoch 44/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9828 - loss: 0.0740 - val_accuracy: 0.9595 - val_loss: 0.1850 Epoch 45/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9689 - loss: 0.0842 - val_accuracy: 0.9657 - val_loss: 0.1265 Epoch 46/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9751 - loss: 0.0869 - val_accuracy: 0.9657 - val_loss: 0.1114 Epoch 47/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9790 - loss: 0.0639 - val_accuracy: 0.9657 - val_loss: 0.1324 Epoch 48/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9796 - loss: 0.0783 - val_accuracy: 0.9657 - val_loss: 0.1028 Epoch 49/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9840 - loss: 0.0614 - val_accuracy: 0.9720 - val_loss: 0.0903 Epoch 50/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9791 - loss: 0.0792 - val_accuracy: 0.9720 - val_loss: 0.0905 Epoch 51/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9824 - loss: 0.0649 - val_accuracy: 0.9720 - val_loss: 0.0889 Epoch 52/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9770 - loss: 0.0684 - val_accuracy: 0.9720 - val_loss: 0.0880 Epoch 53/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9823 - loss: 0.0789 - val_accuracy: 0.9657 - val_loss: 0.1271 Epoch 54/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9789 - loss: 0.0647 - val_accuracy: 0.9657 - val_loss: 0.1772 Epoch 55/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9798 - loss: 0.0683 - val_accuracy: 0.9688 - val_loss: 0.1203 Epoch 56/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9890 - loss: 0.0439 - val_accuracy: 0.9688 - val_loss: 0.1985 Epoch 57/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9884 - loss: 0.0564 - val_accuracy: 0.9657 - val_loss: 0.0853 Epoch 58/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9850 - loss: 0.0577 - val_accuracy: 0.9688 - val_loss: 0.0905 Epoch 59/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9932 - loss: 0.0512 - val_accuracy: 0.9688 - val_loss: 0.0924 Epoch 60/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9790 - loss: 0.0668 - val_accuracy: 0.9720 - val_loss: 0.1118 Epoch 61/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9860 - loss: 0.0550 - val_accuracy: 0.9688 - val_loss: 0.1321 Epoch 62/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9831 - loss: 0.0583 - val_accuracy: 0.9657 - val_loss: 0.1553 Epoch 63/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9843 - loss: 0.0707 - val_accuracy: 0.9688 - val_loss: 0.0843 Epoch 64/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9698 - loss: 0.0783 - val_accuracy: 0.9688 - val_loss: 0.0974 Epoch 65/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9865 - loss: 0.0495 - val_accuracy: 0.9751 - val_loss: 0.0911 Epoch 66/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9809 - loss: 0.0686 - val_accuracy: 0.9720 - val_loss: 0.1043 Epoch 67/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9793 - loss: 0.0781 - val_accuracy: 0.9688 - val_loss: 0.1415 Epoch 68/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9828 - loss: 0.0571 - val_accuracy: 0.9688 - val_loss: 0.1699 Epoch 69/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9841 - loss: 0.0564 - val_accuracy: 0.9720 - val_loss: 0.1128 Epoch 70/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9924 - loss: 0.0394 - val_accuracy: 0.9720 - val_loss: 0.0981 Epoch 71/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9831 - loss: 0.0500 - val_accuracy: 0.9688 - val_loss: 0.1009 Epoch 72/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9871 - loss: 0.0457 - val_accuracy: 0.9720 - val_loss: 0.1099 Epoch 73/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9739 - loss: 0.0765 - val_accuracy: 0.9657 - val_loss: 0.1688 Epoch 74/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9868 - loss: 0.1122 - val_accuracy: 0.9688 - val_loss: 0.1504 Epoch 75/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9853 - loss: 0.0464 - val_accuracy: 0.9657 - val_loss: 0.1511 Epoch 76/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9786 - loss: 0.0698 - val_accuracy: 0.9688 - val_loss: 0.1556 Epoch 77/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9843 - loss: 0.0547 - val_accuracy: 0.9720 - val_loss: 0.1808 Epoch 78/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9907 - loss: 0.0304 - val_accuracy: 0.9720 - val_loss: 0.2295 Epoch 79/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9830 - loss: 0.0697 - val_accuracy: 0.9720 - val_loss: 0.1689 Epoch 80/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9905 - loss: 0.0355 - val_accuracy: 0.9657 - val_loss: 0.1695 Epoch 81/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9817 - loss: 0.0478 - val_accuracy: 0.9720 - val_loss: 0.2304 Epoch 82/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9961 - loss: 0.0264 - val_accuracy: 0.9720 - val_loss: 0.1329 Epoch 83/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9791 - loss: 0.0421 - val_accuracy: 0.9688 - val_loss: 0.1419 Epoch 84/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9842 - loss: 0.0492 - val_accuracy: 0.9751 - val_loss: 0.1054 Epoch 85/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9878 - loss: 0.0383 - val_accuracy: 0.9720 - val_loss: 0.1651 Epoch 86/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9916 - loss: 0.0585 - val_accuracy: 0.9688 - val_loss: 0.1481 Epoch 87/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9891 - loss: 0.0375 - val_accuracy: 0.9688 - val_loss: 0.2199 Epoch 88/600 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9847 - loss: 0.0471 - val_accuracy: 0.9657 - val_loss: 0.2256 Epoch 88: early stopping
<keras.src.callbacks.history.History at 0x7965cfbd8af0>
pd.DataFrame(model.history.history).plot()
<AxesSubplot:>
model.evaluate(X_test_scaled,y_test)
1/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - accuracy: 1.0000 - loss: 0.001111/11 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9717 - loss: 0.2331
[0.22563399374485016, 0.9657320976257324]
pred = model.predict(X_test_scaled)
pred
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step
array([[1.49767114e-08],
[9.99996364e-01],
[9.99989510e-01],
[2.34794206e-05],
[3.26990204e-07],
[4.63475835e-16],
[1.14782417e-09],
[1.37853049e-05],
[2.10904373e-38],
[1.22355688e-17],
[9.47022784e-07],
[9.99997079e-01],
[1.48535608e-19],
[8.24223605e-08],
[1.30682758e-12],
[2.53171286e-11],
[2.50650012e-09],
[1.04336220e-18],
[1.50063850e-09],
[7.22405827e-03],
[1.94349457e-02],
[1.44995431e-12],
[1.44852375e-14],
[9.99998510e-01],
[1.33410865e-03],
[9.95549619e-01],
[4.30452385e-09],
[9.98654902e-01],
[1.21342828e-05],
[1.40959961e-11],
[0.00000000e+00],
[4.76045159e-07],
[6.32333918e-04],
[8.82855767e-34],
[2.98582371e-38],
[2.62632151e-04],
[9.96635616e-01],
[9.99975324e-01],
[9.99907136e-01],
[3.64588294e-03],
[5.00209283e-21],
[9.99977410e-01],
[6.08151424e-07],
[3.18250254e-25],
[4.81809916e-16],
[9.93347764e-01],
[5.17030969e-08],
[2.40078335e-12],
[2.24314720e-31],
[8.48743250e-04],
[9.99987245e-01],
[1.42118861e-23],
[1.10301725e-22],
[1.27917781e-04],
[9.99976397e-01],
[2.66162259e-10],
[9.99967813e-01],
[1.58057641e-03],
[3.00077357e-21],
[7.45777082e-26],
[9.99976039e-01],
[9.48741763e-06],
[1.22351004e-04],
[5.98280679e-17],
[2.03145985e-02],
[1.58556546e-08],
[1.00000000e+00],
[3.51858544e-05],
[9.99987364e-01],
[0.00000000e+00],
[1.87447051e-06],
[2.12559598e-13],
[2.09591460e-23],
[3.75203774e-13],
[2.69616027e-08],
[1.20476494e-02],
[1.77243962e-08],
[9.99995887e-01],
[3.09584647e-09],
[3.29492544e-03],
[8.02365970e-03],
[0.00000000e+00],
[4.81129391e-03],
[1.57202094e-15],
[2.51497738e-08],
[9.96932209e-01],
[9.73208249e-01],
[0.00000000e+00],
[7.90176928e-05],
[1.32435230e-12],
[1.03168841e-03],
[9.99996364e-01],
[1.41271278e-16],
[9.94130254e-01],
[1.03362196e-03],
[8.39420547e-13],
[5.24250609e-05],
[9.99992847e-01],
[3.38503151e-13],
[9.92281199e-01],
[9.93868589e-01],
[3.86682612e-19],
[1.21180094e-16],
[1.97507648e-22],
[9.65301096e-01],
[9.99989212e-01],
[2.72299421e-37],
[1.53624110e-11],
[2.29853402e-15],
[9.99976695e-01],
[9.91829634e-01],
[1.46108699e-23],
[1.51821857e-15],
[6.18548249e-14],
[7.62617029e-03],
[9.99977469e-01],
[1.01631746e-15],
[2.30706161e-13],
[9.93021846e-01],
[4.26051875e-29],
[9.97312963e-01],
[9.88815129e-01],
[0.00000000e+00],
[2.55372271e-25],
[5.73282903e-13],
[2.15930594e-15],
[7.59117454e-02],
[5.33174352e-17],
[9.41204607e-01],
[5.49186974e-10],
[4.84825259e-20],
[2.93265412e-09],
[2.23284303e-12],
[1.59339200e-32],
[9.95716528e-06],
[9.98423755e-01],
[9.97991085e-01],
[9.99936044e-01],
[4.63728655e-09],
[8.33506278e-13],
[5.90753843e-05],
[5.11248027e-06],
[5.72733223e-01],
[3.26046045e-32],
[9.99992728e-01],
[2.55879841e-21],
[2.52899272e-17],
[9.99996662e-01],
[1.93697922e-02],
[1.30361496e-02],
[2.79380589e-30],
[3.88123267e-09],
[7.10944172e-34],
[9.78153795e-02],
[9.99908388e-01],
[1.31809007e-04],
[6.02606923e-11],
[8.24615515e-11],
[2.59907931e-01],
[9.99790788e-01],
[6.59610494e-04],
[4.44521662e-04],
[2.67849828e-06],
[9.99981225e-01],
[3.49268946e-03],
[9.99951541e-01],
[1.55403186e-12],
[9.99987900e-01],
[2.35916131e-09],
[9.68408644e-01],
[4.40323877e-14],
[2.69645341e-02],
[9.36884437e-09],
[9.98465776e-01],
[7.43292091e-18],
[5.03324941e-07],
[2.56846278e-08],
[1.61483132e-12],
[3.24631259e-07],
[9.55076871e-07],
[4.07488848e-13],
[9.02817305e-07],
[1.95152041e-13],
[9.99916255e-01],
[1.11427201e-09],
[8.36632680e-03],
[9.94852602e-01],
[1.17478777e-14],
[5.21913250e-07],
[6.39253904e-18],
[5.01019948e-10],
[9.95687246e-01],
[6.14100462e-03],
[9.99992430e-01],
[2.36836559e-06],
[9.99943733e-01],
[1.53158235e-05],
[9.99977529e-01],
[6.08034456e-11],
[0.00000000e+00],
[7.45354271e-07],
[4.38922256e-31],
[9.93912041e-01],
[0.00000000e+00],
[6.47000771e-20],
[9.99987125e-01],
[4.42098162e-24],
[7.86738191e-03],
[5.20298757e-26],
[9.74497606e-31],
[9.99940813e-01],
[9.83121693e-01],
[9.99517262e-01],
[4.81714703e-11],
[2.12542174e-04],
[1.46269752e-17],
[4.27205709e-14],
[3.30934698e-22],
[7.81060815e-01],
[4.79755229e-11],
[5.99481564e-06],
[9.98857737e-01],
[9.99987185e-01],
[9.99947667e-01],
[1.73099844e-15],
[9.99998868e-01],
[2.88991927e-04],
[9.89477634e-01],
[9.99995291e-01],
[4.88308300e-16],
[1.05363481e-23],
[8.00057555e-30],
[1.12310867e-03],
[1.34212414e-05],
[2.22730218e-04],
[1.65804653e-04],
[1.19396466e-17],
[9.99995232e-01],
[1.74745335e-04],
[1.04058989e-14],
[9.80576673e-22],
[9.99936700e-01],
[1.01002315e-05],
[4.59906581e-13],
[9.52170491e-01],
[3.81837339e-07],
[4.42785386e-04],
[2.36423517e-07],
[9.80862260e-01],
[4.70835261e-07],
[2.17549299e-24],
[8.76087185e-08],
[9.99983728e-01],
[2.34075433e-05],
[2.33462290e-03],
[7.09961325e-24],
[1.69196667e-07],
[9.99958098e-01],
[9.99980152e-01],
[9.99996781e-01],
[9.99981403e-01],
[9.30789160e-04],
[9.54221785e-01],
[6.60807589e-11],
[4.88410133e-26],
[6.79721710e-19],
[2.33659174e-16],
[5.79205283e-04],
[9.68092799e-01],
[5.92151952e-11],
[1.45777332e-17],
[1.46390852e-02],
[6.23501725e-22],
[6.57781857e-06],
[2.29034387e-03],
[2.33601115e-17],
[9.99997020e-01],
[2.96037251e-05],
[9.99976873e-01],
[2.85196786e-20],
[1.39045928e-18],
[9.99954462e-01],
[9.84932721e-01],
[3.13463433e-09],
[9.17021228e-17],
[9.99952197e-01],
[9.99993145e-01],
[2.25755764e-04],
[2.09895666e-06],
[1.81972926e-12],
[2.09203805e-04],
[1.08529506e-15],
[9.78815973e-01],
[2.25965584e-22],
[9.75841741e-10],
[3.77770519e-30],
[1.65859988e-15],
[4.11712192e-03],
[1.53636131e-06],
[2.08774209e-02],
[1.35833941e-15],
[9.61844790e-08],
[2.55301922e-34],
[2.87332514e-04],
[1.62998192e-27],
[4.29834199e-06],
[5.29569425e-02],
[4.29428385e-19],
[1.52198347e-28],
[9.95288432e-01],
[9.99964297e-01],
[5.27848978e-13],
[8.04141820e-09],
[1.53164336e-07],
[4.19612347e-17],
[6.81271078e-04],
[9.99977112e-01],
[6.10742706e-11],
[1.28324442e-02],
[9.99957740e-01],
[1.84369998e-15]], dtype=float32)
pred = np.where(pred > 0.5, 1 ,0)
pred
array([[0],
[1],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[1],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[1],
[1],
[0],
[0],
[1],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[1],
[0],
[1],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[1],
[0],
[0],
[0],
[0],
[1],
[0],
[1],
[0],
[0],
[0],
[1],
[0],
[1],
[1],
[0],
[0],
[0],
[1],
[1],
[0],
[0],
[0],
[1],
[1],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[1],
[0],
[1],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[1],
[1],
[0],
[0],
[0],
[0],
[1],
[0],
[1],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[1],
[0],
[1],
[0],
[1],
[0],
[1],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[1],
[0],
[1],
[0],
[1],
[0],
[1],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[1],
[1],
[1],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[1],
[1],
[1],
[0],
[1],
[0],
[1],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[1],
[0],
[0],
[1],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[1],
[1],
[1],
[1],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[1],
[0],
[0],
[1],
[1],
[0],
[0],
[1],
[1],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[0],
[1],
[1],
[0],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[1],
[0]])
print(classification_report(y_test,pred))
precision recall f1-score support
0.0 0.98 0.97 0.98 237
1.0 0.92 0.95 0.94 84
accuracy 0.97 321
macro avg 0.95 0.96 0.96 321
weighted avg 0.97 0.97 0.97 321
sns.heatmap(confusion_matrix(y_test,pred),annot=True,fmt='d')
<AxesSubplot:>
We've obtained a really strong model, even better than what we achieved with Support Vector Machines. Achieving an accuracy of around 97% on the test set is very satisfactory for us. We won't be testing any further with this classification of gradient networks; we'll now move on to Extreme Learning Machine (ELM), which is a type of Single Hidden Layer Feedforward Network (SLFN). Our ELM class is located in the file ELM.py, which we have already imported at the beginning. It's worth mentioning that during initialization, it takes three main input parameters: the number of features, the number of hidden neurons, and the number of classes.
model2 = ELM(
14,
200,
2
)
y_train_onehot = to_categorical(y_train,2)
y_test_onehot = to_categorical(y_test,2)
model2.fit(X_train_scaled,y_train_onehot)
train_pred = model2.pred(X_train_scaled)
test_pred = model2.pred(X_test_scaled)
model2.acc(train_pred,y_train),model2.acc(test_pred,y_test)
(0.959375, 0.9221183800623053)
As we can see, with just a small number of hidden neurons, we achieved a model that is 92% accurate on the test set, making it a strong competitor for SVM. And remember, we're not introducing any additional parameters here. We can now check what the confusion matrix looks like for our results.
model2.cm(test_pred, y_test)
print(classification_report(y_test, test_pred))
precision recall f1-score support
0.0 0.93 0.95 0.94 237
1.0 0.86 0.81 0.83 84
accuracy 0.92 321
macro avg 0.90 0.88 0.89 321
weighted avg 0.91 0.92 0.92 321
It seems quite satisfying to me, although the classes themselves are not evenly balanced, our model generally makes mistakes evenly, both with healthy and sick individuals. Let's now try to conduct something akin to grid search, but without using cross-validation. Since our function doesn't come from the scikit-learn package, we'll write a short code that allows us to perform such grid search.
accuracy_list = pd.DataFrame(columns=['Number of features','Number of hidden neurons','Accuracy'])
for j in range(50,400,10):
for i in [7,8,9,10,11,12,13,14]:
pca = PCA(n_components=i).fit(X_train_scaled)
X_train_pca = pca.transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)
model = ELM(
i,
j,
9
)
model.fit(X_train_pca, y_train_onehot)
test_pred = model.pred(X_test_pca)
accuracy = model.acc(test_pred, y_test)
new_row = pd.DataFrame(columns=['Number of features','Number of hidden neurons','Accuracy'],data=[[i,j,accuracy]])
accuracy_list = pd.concat([accuracy_list,new_row],ignore_index=True)
accuracy_list.loc[accuracy_list['Accuracy'] == accuracy_list['Accuracy'].max()]
C:\Users\pawel.drzyzga\AppData\Local\Temp\ipykernel_6588\3928860975.py:17: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. accuracy_list = pd.concat([accuracy_list,new_row],ignore_index=True)
| Number of features | Number of hidden neurons | Accuracy | |
|---|---|---|---|
| 133 | 12 | 210 | 0.931464 |
| 135 | 14 | 210 | 0.931464 |
| 219 | 10 | 320 | 0.931464 |
model = ELM(10,250,9)
pca = PCA(n_components=10).fit(X_train_scaled)
X_train_pca = pca.transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)
model.fit(X_train_pca,y_train_onehot)
pred = model.pred(X_test_pca)
plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y_test,pred), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Text(95.72222222222221, 0.5, 'Actuval Value')
Our search led us to find a model that generalizes effectively at a level of over 93%, which may not be as good as what we achieved with SVM, but it certainly can be promising. It's worth noting that the ELM itself is quite "rudimentary" in our case; if we were to expand it, add parameters, we could achieve even better results. Interestingly, the best result was obtained for a model with the parameter n_components = 10.
3. Multiclass classification¶
At the outset, of course, we need to get rid of the column that corresponds to our binary class division.
data2 = data.drop(columns=['Healthy'])
data2
| WBC | LYMp | NEUTp | LYMn | NEUTn | RBC | HGB | HCT | MCV | MCH | MCHC | PLT | PDW | PCT | Diagnosis | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 10.00 | 43.200 | 50.100 | 4.30000 | 5.00000 | 2.77 | 7.3 | 24.2000 | 87.7 | 26.3 | 30.1 | 189.0 | 12.500000 | 0.17000 | Normocytic hypochromic anemia |
| 1 | 10.00 | 42.400 | 52.300 | 4.20000 | 5.30000 | 2.84 | 7.3 | 25.0000 | 88.2 | 25.7 | 20.2 | 180.0 | 12.500000 | 0.16000 | Normocytic hypochromic anemia |
| 2 | 7.20 | 30.700 | 60.700 | 2.20000 | 4.40000 | 3.97 | 9.0 | 30.5000 | 77.0 | 22.6 | 29.5 | 148.0 | 14.300000 | 0.14000 | Iron deficiency anemia |
| 3 | 6.00 | 30.200 | 63.500 | 1.80000 | 3.80000 | 4.22 | 3.8 | 32.8000 | 77.9 | 23.2 | 29.8 | 143.0 | 11.300000 | 0.12000 | Iron deficiency anemia |
| 4 | 4.20 | 39.100 | 53.700 | 1.60000 | 2.30000 | 3.93 | 0.4 | 316.0000 | 80.6 | 23.9 | 29.7 | 236.0 | 12.800000 | 0.22000 | Normocytic hypochromic anemia |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1276 | 4.40 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.86 | 13.5 | 46.1526 | 80.7 | 27.7 | 34.4 | 180.0 | 14.312512 | 0.26028 | Healthy |
| 1277 | 5.60 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.85 | 15.0 | 46.1526 | 91.7 | 31.0 | 33.8 | 215.0 | 14.312512 | 0.26028 | Healthy |
| 1278 | 9.20 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.47 | 13.1 | 46.1526 | 88.7 | 29.3 | 33.0 | 329.0 | 14.312512 | 0.26028 | Healthy |
| 1279 | 6.48 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.75 | 13.2 | 46.1526 | 86.7 | 27.9 | 32.1 | 174.0 | 14.312512 | 0.26028 | Healthy |
| 1280 | 8.80 | 25.845 | 77.511 | 1.88076 | 5.14094 | 4.95 | 15.2 | 46.1526 | 89.7 | 30.6 | 34.2 | 279.0 | 14.312512 | 0.26028 | Healthy |
1281 rows × 15 columns
Now we will encode our data, which is essential when it comes to entering the topic of multiclass classification. Additionally, we will display our labels, which can help us in later interpretation.
data2['Diagnosis_cod'] = LabelEncoder().fit_transform(data2['Diagnosis'])
mapping_dict = {data2['Diagnosis'].unique()[i]: data2['Diagnosis_cod'].unique()[i] for i in range(len(data2['Diagnosis'].unique()))}
X2 = data2.drop(columns=['Diagnosis','Diagnosis_cod'])
y2 = data2['Diagnosis_cod']
mapping_dict
{'Normocytic hypochromic anemia': 5,
'Iron deficiency anemia': 1,
'Other microcytic anemia': 7,
'Leukemia': 2,
'Healthy': 0,
'Thrombocytopenia': 8,
'Normocytic normochromic anemia': 6,
'Leukemia with thrombocytopenia': 3,
'Macrocytic anemia': 4}
Now we split the data and then apply one-hot encoding, which is necessary if we want to use our Extreme Learning Machine. We didn't do this earlier because we only had classes 0 and 1.
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, random_state=42, stratify=y2)
y2_train_onehot = to_categorical(y2_train, 9)
y2_test_onehot = to_categorical(y2_test, 9)
scaler = StandardScaler().fit(X2_train)
X2_train_scaled = scaler.transform(X2_train)
X2_test_scaled = scaler.transform(X2_test)
We're starting by applying the Extreme Learning Machine, using random settings for now. Let's see what we come up with.
model = ELM(
14,
175,
9
)
model.fit(X2_train_scaled, y2_train_onehot)
test_pred = model.pred(X2_test_scaled)
plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y2_test,test_pred), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Text(95.72222222222221, 0.5, 'Actuval Value')
model.acc(test_pred, y2_test)
0.7570093457943925
To start, we're getting a model that has 75% accuracy on the training set. This isn't a bad result. We need to consider that we've entered into multiclass classification, which means not only do we have 9 classes instead of 2, but also that the imbalance in class sizes (as we observed at the beginning) may leave much to be desired. However, there's no need to despair; let's proceed to apply our grid search to try to find the best configuration for our Extreme Learning Machine.
accuracy_list = pd.DataFrame(columns=['Number of features','Number of hidden neurons','Accuracy'])
for j in range(100,350,5):
for i in [10,11,12,13,14]:
pca = PCA(n_components=i).fit(X2_train_scaled)
X2_train_pca = pca.transform(X2_train_scaled)
X2_test_pca = pca.transform(X2_test_scaled)
model = ELM(
i,
j,
9
)
model.fit(X2_train_pca, y2_train_onehot)
test_pred = model.pred(X2_test_pca)
accuracy = model.acc(test_pred, y2_test)
new_row = pd.DataFrame(columns=['Number of features','Number of hidden neurons','Accuracy'],data=[[i,j,accuracy]])
accuracy_list = pd.concat([accuracy_list,new_row],ignore_index=True)
C:\Users\pawel.drzyzga\AppData\Local\Temp\ipykernel_6588\2139844260.py:16: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation. accuracy_list = pd.concat([accuracy_list,new_row],ignore_index=True)
accuracy_list.loc[accuracy_list['Accuracy'] == accuracy_list['Accuracy'].max()]
| Number of features | Number of hidden neurons | Accuracy | |
|---|---|---|---|
| 138 | 13 | 235 | 0.813084 |
pca = PCA(n_components=12).fit(X2_train_scaled)
X2_train_pca = pca.transform(X2_train_scaled)
X2_test_pca = pca.transform(X2_test_scaled)
model = ELM(
12,
220,
9
)
model.fit(X2_train_pca, y2_train_onehot)
test_pred = model.pred(X2_test_pca)
model.cm(test_pred, y2_test)
We receive a slightly improved model that accurately generalizes in over 81% of cases. This gives us a 6 percentage point improvement, which is definitely a good result. It's not the same as with SVM, but here we had much less to do in terms of parameter tuning, and the training time was also significantly shorter. Now let's move on to logistic regression and see how it performs. First, we'll try it with default settings, and then we'll explore some options for the regularization parameter.
LogisticRegression().fit(X2_train_scaled,y2_train).score(X2_test_scaled,y2_test)
0.7320872274143302
pipeline = make_pipeline(
StandardScaler(),
PCA(),
LogisticRegression()
)
param_grid =[
{
'pca__n_components': [10, 11, 12, 13, 14],
'logisticregression__C': [0.01, 0.1, 1, 10, 100, 1000, 5000, 10000, 50000],
'logisticregression__max_iter': [1000]
}
]
gridsearch1 = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10. warnings.warn(
gridsearch1.best_estimator_
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('logisticregression',
LogisticRegression(C=5000, max_iter=1000))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('logisticregression',
LogisticRegression(C=5000, max_iter=1000))])StandardScaler()
PCA(n_components=14)
LogisticRegression(C=5000, max_iter=1000)
gridsearch1.score(X2_test, y2_test)
0.8068535825545171
sns.heatmap(confusion_matrix(y2_test, gridsearch1.predict(X2_test)), annot=True)
<Axes: >
Initially, we get a model that predicts with an accuracy of 73%, which is the worst result we've obtained so far. However, using GridSearchCV, we attempt to find the best parameter tuning, and eventually, our accuracy increases to 80%. A 7 percentage point increase is quite substantial for such a result. It's nearly the same as with the Extreme Learning Machine, but I believe there's definitely more potential for improvement with ELM. Let's now move directly to SVM. We won't create a model with default parameters; instead, we'll start with grid search right away, then analyze it, and step by step try to find the best model.
pipeline1 = make_pipeline(
StandardScaler(),
PCA(),
SVC(probability=True)
)
param_grid =[
{
'pca__n_components': [10,11,12,14],
"svc__kernel": ['rbf'],
'svc__C': [0.001, 0.01, 0.1, 1, 10, 50, 100,500,1000],
'svc__gamma': [0.0001,0.001, 0.01, 0.1, 1, 10, 50, 100]
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
gridsearch1.best_estimator_
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10. warnings.warn(
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=500, gamma=0.01))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=500, gamma=0.01))])StandardScaler()
PCA(n_components=14)
SVC(C=500, gamma=0.01)
gridsearch1.best_score_
0.8791666666666667
gridsearch1.score(X2_test,y2_test)
0.8878504672897196
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
sns.heatmap(confusion_matrix(y2_test, gridsearch1.predict(X2_test)), annot=True)
<Axes: >
At the very beginning, we already receive a quite strong model, definitely the best among those we've created. However, we notice that the accuracy values start to increase as we move towards the northeast direction. Let's try to steer in that direction and find an even better model.
pipeline1 = make_pipeline(
StandardScaler(),
PCA(),
SVC(probability=True)
)
param_grid =[
{
'pca__n_components': [14],
"svc__kernel": ['rbf'],
'svc__C': np.linspace(10,5000,15),
'svc__gamma': np.linspace(.0001,.1,15)
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
gridsearch1.best_estimator_
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10. warnings.warn(
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=3930.714285714286, gamma=0.007235714285714286))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=3930.714285714286, gamma=0.007235714285714286))])StandardScaler()
PCA(n_components=14)
SVC(C=3930.714285714286, gamma=0.007235714285714286)
gridsearch1.score(X2_test,y2_test)
0.9003115264797508
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
plt.figure(figsize=(15,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
sns.heatmap(confusion_matrix(y2_test, gridsearch1.predict(X2_test)), annot=True)
<Axes: >
Our intuition was not wrong; we've already gained a few percentage points on the test set. Just by looking at the heatmap, it's apparent that it might be getting difficult to further improve the model, but let's try expanding the range of parameter search a bit more. Perhaps we'll manage to unearth something.
pipeline1 = make_pipeline(
StandardScaler(),
PCA(),
SVC(probability=True)
)
param_grid =[
{
'pca__n_components': [14],
"svc__kernel": ['rbf'],
'svc__C': np.linspace(1500,50000,15),
'svc__gamma': np.linspace(.0001,.005,15)
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
gridsearch1.best_estimator_
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10. warnings.warn(
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=50000.0, gamma=0.00115))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=50000.0, gamma=0.00115))])StandardScaler()
PCA(n_components=14)
SVC(C=50000.0, gamma=0.00115)
gridsearch1.score(X2_test,y2_test)
0.9034267912772586
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
plt.figure(figsize=(15,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
pipeline1 = make_pipeline(
StandardScaler(),
PCA(),
SVC(probability=True)
)
param_grid =[
{
'pca__n_components': [14],
"svc__kernel": ['rbf'],
'svc__C': np.linspace(20000,170000,15),
'svc__gamma': np.linspace(.0001,.00325,10)
},
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
gridsearch1.best_estimator_
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10. warnings.warn(
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=127142.85714285713, gamma=0.0008))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
('pca', PCA(n_components=14)),
('svc', SVC(C=127142.85714285713, gamma=0.0008))])StandardScaler()
PCA(n_components=14)
SVC(C=127142.85714285713, gamma=0.0008)
gridsearch1.score(X2_test,y2_test)
0.9096573208722741
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")
plt.figure(figsize=(25,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
sns.heatmap(confusion_matrix(y2_test, gridsearch1.predict(X2_test)), annot=True)
<Axes: >
Finally, we find what is arguably the best model we can currently achieve using SVM, and if not the best, then certainly very satisfactory. An accuracy level of over 90% in multiclass classification is certainly optimistic. The obstacle to achieving a higher score may simply be that some classes are very small in size. We can try to use gradient neural networks again.
model = Sequential()
model.add(layers.Flatten(input_shape=(14,1)))
model.add(layers.Dense(112, activation='relu'))
model.add(Dropout(0.2))
model.add(layers.Dense(112, activation='relu'))
model.add(Dropout(0.2))
model.add(layers.Dense(112, activation='relu'))
model.add(Dropout(0.2))
model.add(layers.Dense(9, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
/home/pawel/.local/lib/python3.10/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead. super().__init__(**kwargs)
early_stop = EarlyStopping(monitor ='val_loss',mode='min',verbose=1,patience=25)
model.fit(x = X2_train_scaled,y = y2_train_onehot,batch_size=2,validation_data=(X2_test_scaled,y2_test_onehot),epochs=100,callbacks=[early_stop])
Epoch 1/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.3647 - loss: 1.8143 - val_accuracy: 0.6417 - val_loss: 1.2572 Epoch 2/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.6200 - loss: 1.1091 - val_accuracy: 0.7134 - val_loss: 0.8487 Epoch 3/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7260 - loss: 0.8441 - val_accuracy: 0.7944 - val_loss: 0.7356 Epoch 4/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7751 - loss: 0.6654 - val_accuracy: 0.7664 - val_loss: 0.7381 Epoch 5/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7586 - loss: 0.6412 - val_accuracy: 0.8287 - val_loss: 0.6162 Epoch 6/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8283 - loss: 0.4976 - val_accuracy: 0.8100 - val_loss: 0.5986 Epoch 7/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8258 - loss: 0.5001 - val_accuracy: 0.8318 - val_loss: 0.9454 Epoch 8/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8234 - loss: 0.5284 - val_accuracy: 0.8474 - val_loss: 0.7701 Epoch 9/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8571 - loss: 0.4271 - val_accuracy: 0.8474 - val_loss: 0.6140 Epoch 10/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8492 - loss: 0.4168 - val_accuracy: 0.8411 - val_loss: 0.7402 Epoch 11/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8679 - loss: 0.3667 - val_accuracy: 0.8536 - val_loss: 0.6468 Epoch 12/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8385 - loss: 0.3734 - val_accuracy: 0.8505 - val_loss: 0.6060 Epoch 13/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8952 - loss: 0.3370 - val_accuracy: 0.8723 - val_loss: 0.6300 Epoch 14/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8853 - loss: 0.3218 - val_accuracy: 0.8847 - val_loss: 0.7179 Epoch 15/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8827 - loss: 0.3035 - val_accuracy: 0.8536 - val_loss: 0.6227 Epoch 16/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9007 - loss: 0.2962 - val_accuracy: 0.8910 - val_loss: 0.4081 Epoch 17/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9250 - loss: 0.2346 - val_accuracy: 0.8879 - val_loss: 0.5004 Epoch 18/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8982 - loss: 0.2848 - val_accuracy: 0.8692 - val_loss: 0.4343 Epoch 19/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9194 - loss: 0.2377 - val_accuracy: 0.8692 - val_loss: 0.5852 Epoch 20/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9460 - loss: 0.1755 - val_accuracy: 0.8816 - val_loss: 0.4335 Epoch 21/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9148 - loss: 0.2546 - val_accuracy: 0.8785 - val_loss: 0.6249 Epoch 22/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9201 - loss: 0.2323 - val_accuracy: 0.8785 - val_loss: 0.7322 Epoch 23/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9264 - loss: 0.2127 - val_accuracy: 0.8692 - val_loss: 0.6674 Epoch 24/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9313 - loss: 0.1978 - val_accuracy: 0.8754 - val_loss: 0.6482 Epoch 25/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9423 - loss: 0.1780 - val_accuracy: 0.8816 - val_loss: 0.6170 Epoch 26/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9283 - loss: 0.2024 - val_accuracy: 0.8847 - val_loss: 0.5863 Epoch 27/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9245 - loss: 0.2224 - val_accuracy: 0.8910 - val_loss: 0.4938 Epoch 28/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9307 - loss: 0.2254 - val_accuracy: 0.8972 - val_loss: 0.5071 Epoch 29/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9320 - loss: 0.1933 - val_accuracy: 0.8754 - val_loss: 0.4960 Epoch 30/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9339 - loss: 0.1872 - val_accuracy: 0.8910 - val_loss: 0.5179 Epoch 31/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9485 - loss: 0.1907 - val_accuracy: 0.8879 - val_loss: 0.5998 Epoch 32/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9368 - loss: 0.2432 - val_accuracy: 0.8941 - val_loss: 0.6715 Epoch 33/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9544 - loss: 0.1266 - val_accuracy: 0.8941 - val_loss: 0.6721 Epoch 34/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9293 - loss: 0.1896 - val_accuracy: 0.9097 - val_loss: 0.6837 Epoch 35/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9531 - loss: 0.1456 - val_accuracy: 0.8567 - val_loss: 0.8032 Epoch 36/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9419 - loss: 0.1533 - val_accuracy: 0.9034 - val_loss: 0.6545 Epoch 37/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9436 - loss: 0.1395 - val_accuracy: 0.8723 - val_loss: 0.7557 Epoch 38/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9526 - loss: 0.1871 - val_accuracy: 0.8692 - val_loss: 0.8177 Epoch 39/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9475 - loss: 0.1882 - val_accuracy: 0.8972 - val_loss: 0.6315 Epoch 40/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9490 - loss: 0.1988 - val_accuracy: 0.8879 - val_loss: 0.6939 Epoch 41/100 480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9470 - loss: 0.1623 - val_accuracy: 0.9097 - val_loss: 0.5777 Epoch 41: early stopping
<keras.src.callbacks.history.History at 0x7965242adc00>
pd.DataFrame(model.history.history).plot()
<AxesSubplot:>
model.evaluate(X2_test_scaled,y2_test_onehot)
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9094 - loss: 0.4510
[0.5776535272598267, 0.9096572995185852]
pred = model.predict(X2_test_scaled)
pred = np.argmax(pred,axis=1)
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step
sns.heatmap(confusion_matrix(y2_test,pred),annot=True)
<AxesSubplot:>
To very short conclude, we received an model with a similarly accuracy as the best SVM model.
4. Concluding Remarks on the Entire Project¶
Project Objective¶
The aim of the project was to classify medical data related to blood parameters to accurately diagnose patients' health conditions. Various machine learning algorithms were applied to determine which one best predicts outcomes based on input data.
Initial Data Analysis¶
The data included significant features such as platelet count (PLT), hemoglobin concentration (HGB), and other blood parameters. Exploratory data analysis (EDA) revealed that some features have strong correlations with medical diagnoses.
Data Transformation¶
- Outcome variables were binary encoded, facilitating modeling.
- Data standardization and principal component analysis (PCA) were applied to reduce dimensionality and information redundancy.
Modeling¶
- Logistic regression and support vector machine (SVM) with different kernels (RBF and linear) were used for classification, along with a simple neural network from the TensorFlow package, and the Extreme Learning Machine.
- Data were split into training and test sets to assess overall model performance.
Results¶
- Logistic regression achieved an accuracy of 87% on the test set, indicating solid, though not ideal, performance.
- SVM with RBF kernel achieved the highest accuracy at 95%, suggesting it's the most effective model in this case.
- Gradient-based neural network also showed fairly high accuracy, very close to SVM with RBF kernel in multiclass, but slightly higher than SVM in binary classification.
- Extreme Learning Machine was the biggest surprise of the project, with its surprisingly high accuracy given its simplicity.
Visualization of Results¶
- Confusion matrices and PCA plots helped visualize how well models perform in classification and which classes are most frequently misclassified.
- ROC curves and AUC values showed that models distinguish between classes well, with SVM with RBF kernel having the highest AUC value.
Key Insights¶
- Model Selection: Gradient-based neural network proved to be the best model for classifying medical data, achieving the highest accuracy and best results in other evaluation metrics.
- Feature Importance: Features such as platelet count (PLT) and hemoglobin concentration (HGB) play a crucial role in diagnosing health conditions, as confirmed by high correlation coefficients and importance in models.
- Data Transformation: Data standardization and dimensionality reduction through PCA improved model performance, suggesting these steps are important in processing medical data.
- Application of Machine Learning: Machine learning models, such as SVM or Gradient-based neural network, are effective in classifying medical data and can be applied in clinical practice to assist with diagnoses.
In conclusion, the project demonstrated that appropriately chosen and processed machine learning algorithms can significantly improve health condition diagnoses based on medical data.